AITopics

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report > Experimental Study (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Neural Information Processing SystemsDec-23-2025, 20:07:34 GMT

What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation

Deep learning algorithms are well-known to have a propensity for fitting the training data very well and often fit even outliers and mislabeled data points. Such fitting requires memorization of training data labels, a phenomenon that has attracted significant research interest but has not been given a compelling explanation so far. A recent work of Feldman (2019) proposes a theoretical explanation for this phenomenon based on a combination of two insights. First, natural image and data distributions are (informally) known to be long-tailed, that is have a significant fraction of rare and atypical examples. Second, in a simple theoretical model such memorization is necessary for achieving close-to-optimal generalization error when the data distribution is long-tailed.

discovering, name change, neural network memorize, (11 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.59)

Panda, Subhodip, Tarsadiya, Dhruv, Sourav, Shashwat, P, Prathosh A., Karimireddy, Sai Praneeth

f-INE: A Hypothesis Testing Framework for Estimating Influence under Training Randomness

arXiv.org Artificial IntelligenceOct-14-2025

Influence estimation methods promise to explain and debug machine learning by estimating the impact of individual samples on the final model. Yet, existing methods collapse under training randomness: the same example may appear critical in one run and irrelevant in the next. Such instability undermines their use in data curation or cleanup since it is unclear if we indeed deleted/kept the correct datapoints. To overcome this, we introduce *f-influence* -- a new influence estimation framework grounded in hypothesis testing that explicitly accounts for training randomness, and establish desirable properties that make it suitable for reliable influence estimation. We also design a highly efficient algorithm **f**-**IN**fluence **E**stimation (**f-INE**) that computes f-influence **in a single training run**. Finally, we scale up f-INE to estimate influence of instruction tuning data on Llama-3.1-8B and show it can reliably detect poisoned samples that steer model opinions, demonstrating its utility for data cleanup and attributing model behavior.

data mining, machine learning, natural language, (18 more...)

2510.1051

Country: North America > United States > California (0.46)

Genre: Research Report (0.40)

Industry: Government (0.47)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

arXiv.org Artificial IntelligenceSep-24-2025

Graph Neural Networks with Similarity-Navigated Probabilistic Feature Copying

Hevapathige, Asela

Graph Neural Networks (GNNs) have demonstrated remarkable success across various graph-based tasks. However, they face some fundamental limitations: feature oversmoothing can cause node representations to become indistinguishable in deeper networks, they struggle to effectively manage heterogeneous relationships where connected nodes differ significantly, and they process entire feature vectors as indivisible units, which limits flexibility. We seek to address these limitations. We propose AxelGNN, a novel GNN architecture inspired by Axelrod's cultural dissemination model that addresses these limitations through a unified framework. AxelGNN incorporates similarity-gated probabilistic interactions that adaptively promote convergence or divergence based on node similarity, implements trait-level copying mechanisms for fine-grained feature aggregation at the segment level, and maintains global polarization to preserve node distinctiveness across multiple representation clusters. The model's bistable convergence dynamics naturally handle both homophilic and heterophilic graphs within a single architecture. Extensive experiments on node classification and influence estimation benchmarks demonstrate that AxelGNN consistently outperforms or matches state-of-the-art GNN methods across diverse graph structures with varying homophily-heterophily characteristics.

artificial intelligence, graph neural network, machine learning, (10 more...)

2509.19084

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningMay-27-2025

Towards Robust Influence Functions with Flat Validation Minima

Ye, Xichen, Wu, Yifan, Zhang, Weizhong, Jin, Cheng, Chen, Yifan

The Influence Function (IF) is a widely used technique for assessing the impact of individual training samples on model predictions. However, existing IF methods often fail to provide reliable influence estimates in deep neural networks, particularly when applied to noisy training data. This issue does not stem from inaccuracies in parameter change estimation, which has been the primary focus of prior research, but rather from deficiencies in loss change estimation, specifically due to the sharpness of validation risk. In this work, we establish a theoretical connection between influence estimation error, validation set risk, and its sharpness, underscoring the importance of flat validation minima for accurate influence estimation. Furthermore, we introduce a novel estimation form of Influence Function specifically designed for flat validation minima. Experimental results across various tasks validate the superiority of our approach.

artificial intelligence, machine learning, val, (17 more...)

arXiv.org Machine Learning

2505.19097

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Terashita, Naoyuki, Ohashi, Hiroki, Hara, Satoshi

Data Cleansing for GANs

arXiv.org Machine LearningApr-1-2025

As the application of generative adversarial networks (GANs) expands, it becomes increasingly critical to develop a unified approach that improves performance across various generative tasks. One effective strategy that applies to any machine learning task is identifying harmful instances, whose removal improves the performance. While previous studies have successfully estimated these harmful training instances in supervised settings, their approaches are not easily applicable to GANs. The challenge lies in two requirements of the previous approaches that do not apply to GANs. First, previous approaches require that the absence of a training instance directly affects the parameters. However, in the training for GANs, the instances do not directly affect the generator's parameters since they are only fed into the discriminator. Second, previous approaches assume that the change in loss directly quantifies the harmfulness of the instance to a model's performance, while common types of GAN losses do not always reflect the generative performance. To overcome the first challenge, we propose influence estimation methods that use the Jacobian of the generator's gradient with respect to the discriminator's parameters (and vice versa). Such a Jacobian represents the indirect effect between two models: how removing an instance from the discriminator's training changes the generator's parameters. Second, we propose an instance evaluation scheme that measures the harmfulness of each training instance based on how a GAN evaluation metric (e.g., Inception score) is expected to change by the instance's removal. Furthermore, we demonstrate that removing the identified harmful instances significantly improves the generative performance on various GAN evaluation metrics.

artificial intelligence, inductive learning, machine learning, (18 more...)

arXiv.org Machine Learning

doi: 10.1109/TNNLS.2025.3529540

2504.00603

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceFeb-2-2025

DiffIM: Differentiable Influence Minimization with Surrogate Modeling and Continuous Relaxation

Lee, Junghun, Kim, Hyunju, Bu, Fanchen, Ko, Jihoon, Shin, Kijung

In social networks, people influence each other through social links, which can be represented as propagation among nodes in graphs. Influence minimization (IMIN) is the problem of manipulating the structures of an input graph (e.g., removing edges) to reduce the propagation among nodes. IMIN can represent time-critical real-world applications, such as rumor blocking, but IMIN is theoretically difficult and computationally expensive. Moreover, the discrete nature of IMIN hinders the usage of powerful machine learning techniques, which requires differentiable computation. In this work, we propose DiffIM, a novel method for IMIN with two differentiable schemes for acceleration: (1) surrogate modeling for efficient influence estimation, which avoids time-consuming simulations (e.g., Monte Carlo), and (2) the continuous relaxation of decisions, which avoids the evaluation of individual discrete decisions (e.g., removing an edge). We further propose a third accelerating scheme, gradient-driven selection, that chooses edges instantly based on gradients without optimization (spec., gradient descent iterations) on each test instance. Through extensive experiments on real-world graphs, we show that each proposed scheme significantly improves speed with little (or even no) IMIN performance degradation. Our method is Pareto-optimal (i.e., no baseline is faster and more effective than it) and typically several orders of magnitude (spec., up to 15,160X) faster than the most effective baseline while being more effective.

artificial intelligence, effectiveness, machine learning, (20 more...)

2502.01031

Country:

North America > United States > Virginia (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > South Korea > Daejeon > Daejeon (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology (0.50)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Neural Information Processing SystemsJan-22-2025, 09:20:11 GMT

Review for NeurIPS paper: What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation

Weaknesses: I would like to see some clarification on the long tail theory. If the value of mem(A,S,i_1,...,i_k) is high, perhaps we can still call this phenomenon "memorization." If so, then memorization phenomenon is not just limited to long tails. Then, it seems to me the claim in [12] that memorization is needed due to long tail may not be showing a bigger picture. The paper mentions that very high influence scores are due to near duplicates in the training and test examples.

influence estimation, neural network memorize, neurips paper, (8 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Neural Information Processing SystemsJan-22-2025, 09:20:04 GMT

Review for NeurIPS paper: What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation

The reviews feel that the issues are interesting and the contributions are sufficient for acceptance. However, there are serious suggestions for improvements in the experiments. It seems the paper is suggestive, but not definitive, on the long tail hypothesis.

influence estimation, neural network memorize, neurips paper, (2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

arXiv.org Artificial IntelligenceDec-11-2024

DMin: Scalable Training Data Influence Estimation for Diffusion Models

Lin, Huawei, Lao, Yingjie, Zhao, Weijie

Identifying the training data samples that most influence a generated image is a critical task in understanding diffusion models, yet existing influence estimation methods are constrained to small-scale or LoRA-tuned models due to computational limitations. As diffusion models scale up, these methods become impractical. To address this challenge, we propose DMin (Diffusion Model influence), a scalable framework for estimating the influence of each training data sample on a given generated image. By leveraging efficient gradient compression and retrieval techniques, DMin reduces storage requirements from 339.39 TB to only 726 MB and retrieves the top-k most influential training samples in under 1 second, all while maintaining performance. Our empirical results demonstrate DMin is both effective in identifying influential training samples and efficient in terms of computational and storage requirements.

artificial intelligence, diffusion model, machine learning, (16 more...)

2412.08637

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(9 more...)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)